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Abstract 

A classical result by Guibas and Odlyzko obtained in 1981 gives the generating function for the number 
of strings that avoid a given set of substrings with the property that no substring is contained in any of 
the others. In this paper, we give an analogue of this result for the enumeration of compositions that 
avoid a given set of prohibited substrings, subject to the compositions' length (number of parts) and 
weight. We also give examples of families of strings to be avoided that allow for an explicit formula 
for the generating function. Our results extend recent results by Myers on avoidance of strings in 
compositions subject to weight, but not length. 
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1. Introduction 

In 1981, Guibas and Odlyzko lj obtained the generating function for the number of strings avoiding a 
given set of prohibited substrings and then applied this result to non-transitive games. (A string s = 
S1S2 ■ ■ ■ s m contains a substring b\b2 ■ ■ • of length k if there is an index i such that s^Si+i • ■ ■ Si+k-i = 
b\b2 ■ ■ - bk- Otherwise, we say that s avoids the substring 61&2 • ■ ■ A detailed derivation of this 
generating function and related results in the binary case was later given by Winterfjord in his Masters 
thesis [5] . The basic idea in the derivation of the generating function is the notion of the correlation 
between two strings and being able to enumerate the strings avoiding the set of substrings in two 
different ways. Let X% = oqOi . . .a m -i an d X2 = &o^i • ■ 1 be two strings of lengths m and £, 
respectively, over the alphabet [n] = {1, 2, . . . , n}. The correlation C12 = cqCi . . . c m _i is the binary 
string defined as follows: 

m < £: For < j < m — 1, Cj = 1 if a% = fe^-m+i+j for i = 0, 1, . . . , m — j — 1, and Cj = otherwise; 
m > £: For < j < m — £, Cj = 1 if bi = a m -£-+i-j for i = 0, 1, ...,£ — 1, and Cj = otherwise; for 
m — £ + 1 < j < m — 1 , Cj = 1 if a, = fcf- m +i+j for i = 0, 1, . . . , m — j — 1 and Cj = otherwise. 



1 The work presented here was supported by grant no. 090038011 from the Icelandic Research Fund. 
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In plain English, this means that Cj is equal to 1 if and only if the coefficients in the overlap of the 
string Xi and the string X2, shifted (or offset) by j positions to the left, agree, as illustrated in 
Figure [TJ 



X 2 



overlap • tail 



overlap 



Figure 1 . Comparing strings X\ and X2 



For example, if X\ = 110 and X2 = 1011, then cyi = Oil and C21 = 0010, as depicted below: 
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In general cyi 7^ C21 and, unless the strings are of the same lengths, the correlations will have different 
lengths. The autocorrelation of a string or word X\ is just Cn, the correlation of X± with itself. 
For instance, if X\ = 1011 then en = 1001. It is convenient to associate a correlation •polynomial 
Cnix) = Co + ciq+ ■ ■ ■ + Ck-iq 1 with the correlation c\ 2 — cqCi . . . Ck-i- This correlation polynomial 
is the generating function for the number of letters in the tail, the portion that is to the right of the 
overlap in the substring X\ , as illustrated in Figure [TJ 

We now state the general result given by Guibas and Odlyzko [TJ in the form given (for the special 
case of binary strings) in Winterfjord [SJ Th. 24]. 

Theorem 1.1. The generating junction for the number of strings or words of length n over a given 
alphabet that avoid the substrings Si, . . . , Sk of lengths ti,...,tk respectively, none included in any 
other, is given by 





-cn(g) ••• -ci„{q) 

-Cnl(q) ■■■ -Cnn(q) 






(1-nq) 1 • • • 1 
q e± -cn(g) ••• -ci„(g) 

q lk -Cfci(?) ••• -c kk {q) 





where cij (q) is the correlation polynomial for the substrings Si and Sj . 

Unfortunately, the approach by Guibas and Odlyzko is not applicable to permutations and subpermu- 
tations, or when patterns (as opposed to strings) are to be avoided. However, the approach generalizes 
to compositions avoiding a set of prohibited substings, and we will derive a formula for the most general 
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case that is an analogue of the formula by Guibas and Odlyzkdj. This generalization to compositions 
follows the current interest in compositions which have been studied from different perspectives in 
the literature, mostly from the view point of pattern avoidance (see [2] and references therein). Our 
results add a facet to this research. 

Let N be the set of natural numbers. A composition a = o~\ ■ ■ ■ a m of n £ N is an ordered collection 
(or string) of one or more positive integers whose sum, also called the composition's weight w(o~), is 
n. The number of summands or letters, namely m, is called the number of parts of the composition 
and is denoted by l{a). The main result of this paper is the derivation of the generating function 

G(x, q) = G(5x, ...,S k ;x,q) = J2 z w(ct V (<t) 

a 

where the sum is taken over all compositions with parts in N simultaneously avoiding the prohibited 
substrings Si, i = 1, . . . , k, where none of the substrings is included in any other. We state and prove 
this result in Section [3 and then give applications of our result for families of prohibited substrings in 
Section O 



2. Main result 



In order to generalize Theorem 11.11 to compositions, we need to adapt the correlation polynomial 
to also keep track of the the weight in addition to the length of the tail. We therefore define the 
correlation polynomial for a correlation Cy = c$ci . . . c m _i between Si = a$ai . . . a m _i and Sj as 

Cij{x, q)=C + ClX W( ~ am -^q + C2X ^(« m -2am-i) g 2 + . . . + Cm _ lX w (^a 3 ...a m ^) q m-^ 

For example, for X\ — 110 and X2 = 1011 considered in Section [IJ C\2{x,q) = x + x 2 q, C2i[x,q) — 
(xq) 2 , Cxi(x,q) — 1, and 022(2,9) = 1 + x 3 q 2 . Note that since we are considering compositions, all 
parts are positive and therefore each term but the first one of a correlation polynomial is divisible by 
xq (the first term is either or 1). We are now ready to state the main result. 

Theorem 2.1. The generating function for the number of compositions of weight n and length m with 
parts in N that avoid the substrings Si, ...,Sk of lengths i(S\), . . . ,((Sk) respectively, none included 
in any other, is given by 



(1-x)- 


-c n (x,q) ■■■ -ci n (x,q) 
-c„i(x,q) •■• -c nn (x,q) 






1 — x{\ + q) 1 — x •■■ 1 — x 

x w(S 1 ) q i(S 1 ) - Cll (x,q) ... -c ln (x,q) 

x MS k ) q i{S k ) - Cnl ( X) q) ... ~c nn (x,q) 





where Cij{x,q) are the correlation polynomials defined above. 



2 As the matter of fact, a recent paper by Myers [J considers a very similar problem. However, we are able to control 
both length and weight in compositions, as opposed to just weight, while Myers' result is more general with respect to 
the alphabet considered. 
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Proof. In finding G(x, q) we adapt the arguments in [I] [5] to compositions. Let A denote the set 
of all compositions avoiding the prohibited substrings and let Bi, for i — 1, . . . ,k, be the set of all 
compositions ending with Si but having no other occurrence of any of the prohibited substrings. A 
composition in Bi is said to quasi-avoid Si. We denote the generating function corresponding to Bi 
by Bi(x, q) and note that G(x, q) is the generating function of the set A. Furthermore, the sets A and 
Bi are all pairwise disjoint as none of the substrings is included in any of the others. 

We now derive recurrences for certain sets of compositions. Note that we can create compositions 
of weight n + 1 recursively from those of weight n > 1 by either increasing the last part by 1 or by 
appending a part 1 at the right end of the composition. For a set of compositions M, let M +1 denote 
the set obtained from M by increasing the rightmost part of each non-empty composition by 1, and let 
M x {1} denote the set obtained from M by adjoining the new rightmost part 1 to each composition 
in M . With this notation, we can express the set of compositions that either avoid or quasi-avoid the 
substrings as follows: 

(2.2) A U B 1 U • • • U B k = {e} U (A U B 1 U • • • U B k - {e}) +1 U [A x {1}), 

where e is the empty composition. The expression on the right hand side follows as increasing the last 
part of a composition that avoids all substrings can create an occurrence of a substring, but only at the 
very end of the composition, and likewise when adding a new part. On the other hand, a composition 
that quasi-avoids a string is transformed either into a composition that avoids the substrings or quasi- 
avoids a different substring when increasing the last part. However, when appending the part 1 to a 
composition that quasi-avoids Si we create a composition that contains Si, so that operation is not 
allowed for the sets Bi . Increasing the last part results in an increase in the weight of the composition 
by 1 but no increase in the number of parts, while appending the part 1 increases both the weight 
and the length of the composition. Thus (12. 2ft can be expressed in terms of generating functions as 

(2.3) (1 - x - xq)G{x, q) + {l- x){B x {x, q) + ■ ■ ■ + B k (x, q)) = 1 - x, 

where we have used that the generating function of the union of disjoint sets is the sum of the 
respective generating functions, and the generating function of a Cartesian product is the product of 
the respective generating functions. 

We now create an alternative connection between the sets A and Bi. Let Ri denote the set of compo- 
sitions that consist of a composition from A followed by the prohibited string Si, where i = 1, . . . , k. 
Note that Ri and Rj are disjoint for i ^ j as none of Si's is included in any other. Furthermore, 
the set Ri is not identical to the set Bi as it is possible that a prohibited string will occur inside a 
string in Ri, not only at the end. For a composition (or string) X from Bj, we call a string Y with 
£(Y) < £(Si) — 1 a possible ij-tail if XY ends with the substring Si. This nomenclature is readily 
understood when comparing Figure [2] to Figure [TJ as Y is the tail in the comparison of Si with Sj . 



avoid 


Si 






Y 



Figure 2. The ij-tail Y. 
With this definition, we obtain the following equality of sets: 
(2.4) A x Si = Ui<j< k Bj x {possible ij-tail}, 
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which in terms of generating functions gives the following equation for each i = 1, . . . , k: 

k 

(2.5) G(x,q)x»W<fW-^c ij (x,q)B j (x,q)=0. 

j=i 

Indeed, a proof of (|2.4|) is identical to the corresponding statement for strings that can be found 
in [TJ [5] (it does not matter whether we deal with strings or compositions in this case) , while for the 
generating functions, the difference is that we also keep track of the weight in the compositions using 
the variable x. 

Combining (|2.3|) and (|2.5j) results in the following set of equations 



/ l-x(l 
x w(S 1 ) g t 
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—ci n (x, q) 
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Using Cramer's rule to solve for G(x,q) gives formula (|2.ip . □ 



3. Applications of Theorem 12.11 



Even though Theorem (|2.1[) provides an explicit solution to the enumerative problem, it involves 
the evaluation of determinants which may not be a simple thing to do. While one can easily find 
explicit formulas for the generating function that do not involve determinants when there are just a 
few prohibited substrings, it is interesting to know in which cases the determinants can be evaluated 
for families of prohibited substrings. In this section, we evaluate the determinants for a family of 
prohibited substrings which generalizes the well-based sets used in [3] to count independent sets in 
certain graphs called path-schemes. 

Let V denote the string consisting of i l's and let V = Ui<i<fe{21 ai_1 2} with 1 < a\ < < ■ ■ ■ < a k 
be the set of substrings to be avoided. Note that none of the substrings in V is included in any other. 
Thus we can apply formula (|2.1[) to find the generating function for the number of compositions 
avoiding all the substrings in V simultaneously. 

Corollary 3.1. The generating function V(x, q) for the number of compositions of weight n and length 
m with parts in N that avoid the family of substrings V defined above is given by 

(3.1) V(X, q) = j 1 '^^^^ 



(1 - 1(1 + q) + (1 - x)x 2 q){l + xY/ i= i{xq) ai ) - (1 - x)x 2 q 



Proof. It is easy to see that the correlation polynomial for the two strings 21 ai 1 2 and 21 ai 1 2 is 
Cij(x,q) = Sij + x(xq) ai , where <5y is the Kronecker delta. Also, x w ^ 2iai 2 ^q e ( 2ia ' 2 ) = 2^+3^0;+! _ 
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Therefore Theorem 12.11 gives that 

(1-x) 

V(x,q) = 



-l-x(xq) ai -x(xq) ai 
—x(xq) a2 — 1 — x(xq) a2 



-x(xq) ak 



-x(xq) ak 



—x(xq) ai 
—x{xq)°" 2 

-1 - x(xq) ak 



1 — x(l + q) 1 — x 1 — x 

x a 1+ 3 q a 1 + l _j _ x ( xq yi - X ( X q) ai 

x a 2 +3 q a 2 +i - x (xq) aQ -1 - x(xq) a2 



x a k +3 q a k + l 



-x(xq) ak 



-x(xq) ak 



1-x 
-x(xq) ai 
—x(xq) a2 

-1 - x{xq) ak 



To compute the determinant in the numerator, replace row 1 by the sum of all rows and then factor 
out the common factor (—1 — x^2^ =1 (xq) ai ). Next subtract column 1 from columns 2,3,...,% to 
obtain 



-(i+z5> ? n 



1 

-x{xq) a2 -1 

-x(xq) a3 -1 

x(xq) ak 







-1 



= (-!)*• (! + *£«**)• 



To compute the determinant in the denominator, replace column 1 by the sum of column 1 and 
cc 2 g-(column (k + 1)) and for i — 2,3, . . . , fc, replace column i by the difference of column i and 
(column (k + 1)) to yield 



1 - x{\ + q) + (1 - x)x 2 q 
-1 




2 

-x q 







-1 
1 



1-x 
-x(xq) ai 
—x(xq) a2 

-x(xq) ctk - 1 
-1 - x(xq) ak 



To obtain an upper triangular matrix we replace the last row in this determinant by 

x 2 q(row 1) , , 

1-X(1 + q) + (1 - xWq + (TOW 2) + (r ° W 3) + ' • • + (TOW ^ + 1)} 

which yields that the determinant of the denominator is equal to 



(-l) fe [(1 - x)x 2 q - (1 - x(l + q) + (l- x)x 2 q){l + x^2{xq) a > 

i=l 

completing the proof. 



□ 



Further simplifications of V(x,q) are possible whenever ]\2 i=1 (xq) ai can be simplified. We provide 
three examples here. 
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Example 3.2. The set of prohibited substrings {22, 212, . . . , 2i k ~ 1 2} corresponds to {a l5 a 2 , . . . , &&} = 
{1, 2, . . . , k}. In this case, 13.1)) reduces to 

y, q} = (\-x)(l-xq + x 2 q(l-(xq) k )) 

' (1 — x(l + q) + (1 — x)x 2 q)(l — xq + x 2 q(l — (xq) k )) — (1 — x)(l — xq)x 2 q 

The initial values ofV2(x,q) (avoiding 22 and 212) are as follows: 

V 2 (x,q) = 1 + qx + (q + q 2 ) x 2 + (q + 2q 2 + q 3 ) x 3 + (q + 2q 2 + 3q 3 + g 4 ) x 4 + 

(q + V + 3q 3 + 4? 4 + q 5 ) x 5 + (q + 5q 2 + 9q 3 + 5g 4 + 5<7 5 + q 6 ) x 6 + ■ ■ ■ 

Example 3.3. The set of prohibited substrings that have an even number of 1 's, {22, 2112, . . . , 2i 2k 2] 
is represented by the set {a\, a 2 , ■ • •} = {1, 3, 5, . . . , 2fc+ 1}. In this case, 113.1)) is simplified as follows: 

, , (l-x)(l-(xq) 2 + x 2 q(l-(xq) 2k +l)) 

° [X ' q) (1 - (1 + q)x + (1 - x)x 2 q) (1 - {xq) 2 + xq 2 (1 - (xq) 2k+1 )) - (1 - x)x 2 q (1 - (xq) 2 ) ' 
The initial values ofV (x,q) for k = 2 (avoiding {22,2112,211112}) are as follows: 

V a {x,q) = 1 + xq + [q + q 2 )x 2 + (q + 2q 2 + q 3 )x 3 + (q + 2q 2 + 3q 3 + q 4 )x 4 + 

(q + Aq 2 + iq 3 + Aq* + q b )x b + (q + 5q 2 + 9q 3 + 6q 4 + 5q 5 + q 6 )x 6 + 

(q + 6q 2 + Uq 3 + 16q 4 + 9q 5 + 6g 6 + q 7 )x 7 + 

(q + 7q 2 + 19q 3 + 28q* + 26q 5 + 12q e + 7q 7 + q s )x 8 + ■■■ . 

Example 3.4. The set of prohibited substrings that have an odd number ofl 's, {212, 21112, . . . , 2i 2fc ~ 1 2} 
is represented by the set {01, a 2 , ■ . .} = {2, 4, 6, ... , 2k}. In this case, 1)3.1)) is simplified as follows: 

, , (l-x)(\-{xg) 2 +x 3 q 2 (\-{xg) 2k )) 

{X ' q) (1 - (1 + q)x + (1 - x)x 2 q) (1 - (xq) 2 + X 3 q 2 (1 - (xq) 2k )) - (1 - x)x 2 q (1 - (xq) 2 ) ' 
The initial values ofV e (x,q) for k = 2 (avoiding {212,21112}) are as follows: 

V e (x,q) = l + xq+ (q + q 2 ) x 2 + (q + 2q 2 + q 3 ) x 3 + (q + 3q 2 + 3q 3 + q i )x i + 

(q + Aq 2 + 5q 3 + 4q 4 + q 5 ) x 5 + (q + 5q 2 + 10q 3 + 8q 4 + 5q 5 + q 6 ) x 6 + 

(q + 6q 2 + I5q 3 + 18q 4 + llg 5 + 6q 6 + q 7 ) x 7 + 

(q + 7q 2 + 21q 3 + 33q 4 + 30q 5 + 15q 6 + 7q 7 + q S ) x 8 + ■ ■ ■ . 



Clearly, other families of substrings can be created that allow for similar simplification of the generating 
function. 
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